Efficient numerical diagonalization of hermitian 3x3 matrices 
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A very common problem in science is the numerical diagonalization of symmetric or hermitian 3x3 
matrices. Since standard "black box" packages may be too inefficient if the number of matrices is 
large, we study several alternatives. We consider optimized implementations of the Jacobi, QL, and 
Cuppen algorithms and compare them with an analytical method relying on Cardano's formula for 
the eigenvalues and on vector cross products for the eigenvectors. Jacobi is the most accurate, but 
also the slowest method, while QL and Cuppen are good general purpose algorithms. The analytical 
algorithm outperforms the others by more than a factor of 2, but becomes inaccurate or may even 
fail completely if the matrix entries differ greatly in magnitude. This can mostly be circumvented by 
using a hybrid method, which falls back to QL if conditions are such that the analytical calculation 
might become too inaccurate. For all algorithms, we give an overview of the underlying mathematical 
ideas, and present detailed benchmark results. C and Fortran implementations of our code are 
available for download from http://www.mpi-hd.mpg.de/-~globes/3x3/. 



1. INTRODUCTION 

In many scientific problems, the numerical diagonaliza- 
tion of a large number of symmetric or hermitian 3x3 
matrices plays a central role. For a matrix A, this means 
calculating a set of eigenvalues Xi and eigenvectors Vj, 
satisfying 



Avi = Xi\i 



(1) 



An example from classical mechanics or molecular sci- 
ence is the determination of the principal axes of a solid 
object [1]. The author's interest in the problem arises 
from the numerical computation of neutrino oscillation 
probabilities in matter [2-4], which requires the diago- 
nalization of the Hamiltonian operator 
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Here, U is the leptonic mixing matrix, Am^i and Am§ x 
are the differences of the squared neutrino masses, and V 
is the MSW (Mikheyev-Smirnov-Wolfenstein) Potential 
describing coherent forward scattering in matter. If cer- 
tain non-standard physics contributions are considered, 
the MSW matrix can also contain more than one non- 
zero entry [5] . 

There exist many publicly available software pack- 
ages for the calculation of matrix eigensystems, e.g. LA- 
PACK [6], the GNU Scientific Library [7], or the Nu- 
merical Recipes algorithms [8]. These packages exhibit 
excellent accuracy, but being designed mainly for very 
large matrices, they may produce a lot of computational 
overhead in the simple 3x3 case. This overhead comes 
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partly from the algorithms themselves, and partly from 
the implementational details. 

In this letter, we will study the performance of several 
algorithms which were optimized specifically for 3x3 
matrices. We will discuss the well-known Jacobi, QL 
and Cuppen algorithms, and compare their speed and 
accuracy to that of a direct analytical calculation using 
Cardano's formula for the eigenvalues, and vector cross 
products for the eigenvectors. The application of Car- 
dano's formula to the 3x3 eigenproblem has been sug- 
gested previously in [9] , and formulas for the eigenvectors 
based on the computation of the Euler angles have been 
presented in [10], 

The outline of the paper is as follows: In Sees. 2 
and 3, we will describe the mathematical background 
of the considered algorithms as well as the most im- 
portant implementational issues, and discuss their nu- 
merical properties. In Sec. 4, we will briefly mention 
some other algorithms capable of solving the 3x3 eigen- 
problem, and give reasons why we do not consider them 
to be the optimal choice for such small matrices. Our 
purely theoretical discussion will be complemented in 
Sec. 5 by the presentation of detailed benchmark re- 
sults. Finally, we will draw our conclusions in Sec. 6. 
The appendix contains two alternative derivations of 
Cardano's formulas, and the documentation of our C 
and Fortran code, which is available for download from 
http : //www . mpi-hd . mpg . de/^globes/ 3x3/. 



2. ITERATIVE ALGORITHMS 
2.1. The Jacobi method 

One of the oldest methods for the diagonalization of an 
arbitrary symmetric or hermitian n x n matrix A is the 
Jacobi algorithm. Discussions of this algorithm can be 
found in [8, 11-14]. Its basic idea is to iteratively zero the 
off-diagonal elements of A by unitary transformations of 
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the form 



/l 



c • • • se 
: 1 : 



-se 



(3) 



V V 

The matrices P pq differ from the unit matrix only in the 
(pp), (pq), [qp), and (qq) elements; c and s are required 
to satisfy s 2 + c 2 = 1 , and can thus be expressed in the 
form 



c = cos < 
s = sin c 



(4) 



for some real angle <j>. The complex phase a is absent if 
A has only real entries. <f> and a are chosen in such a 
way that the similarity transformation 

A — > pl q A P p9 (5) 

eliminates the (pq) element (and thus also the (qp) ele- 
ment) of A. Of course it will in general become nonzero 
again in the next iteration, where p or q is different, but 
one can show that the iteration (5) converges to a diago- 
nal matrix, with the eigenvalues of A along the diagonal, 
if p and q cycle through the rows or columns of A [14]. 
The normalized eigenvectors are given by the columns of 
the matrix 



Q 



p p p 



(6) 



2.2. The QR and QL algorithms 

The QR and QL algorithms are among the most widely 
used methods in large scale linear algebra because they 
are very efficient and accurate although their implemen- 
tation is a bit more involved than that of the Jacobi 
method [8, 12, 15]. They are only competitive if applied 
to real symmetric tridiagonal matrices of the form 
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Therefore, as a preliminary step, A has to be brought to 
this form. 



2.2.1. Reduction of A to tridiagonal form 

There are two main ways to accomplish the tridiago- 
nalization: The Givens method, which consists of succes- 
sively applying plane rotations of the form of Eq. (3) (in 



contrast to the Jacobi reduction to diagonal form, the 
Givens reduction to tridiagonal form is non- iterative), 
and the Householder method, which we will discuss here. 
A Householder transformation is defined by the unitary 
transformation matrix 



with 



and 



P = I - wuu' 



u = x =F |x e$ 



1 

|u| : 



UX 



(8) 



(9) 



(10) 



Here, x is arbitrary for the moment, and ej denotes the 
i-th unit vector. From a purely mathematical point of 
view, the choice of sign in Eq. (9) is arbitrary, but in the 
actual implementation we choose it to be equal to the 
sign of the real part of X{ to avoid cancellation errors. P 
has the property that Px <~ because 



(I — cjuu^)x = X 



uu'x 



2|x|2 T |x| (*,+<) 
(ulx + xt u )(x=F l x l e ») 

2|x|2 T | X |( a ; J + <) 

(|x| 2 T|x| a! i + |x| 2 T|x| a: J)(xT|x|e i ) 



2|x| 2 T|x|(a; i +0 



±|x|e, ; . 



(11) 



This means, that if we choose x to contain the lower n— 1 
elements of the first column of A and set x\ = 0, = e 2 , 
then 



PiAPj = 



x 



V 



7 



(12) 



Note that the first row and the first column of this matrix 
are real even if A is not. In the next step, x contains the 
lower n — 2 elements of the second column of PiAPj, 
x\ = X2 = 0, and e, = e 3 , so that the second row (and 
column) is brought to the desired form, while the first 
remains unchanged. This is repeated n — 1 times, until 
A is fully tridiagonal and real. 

For the actual implementation of the Householder 
method, we do not calculate the matrix product PAPt 
directly, but instead evaluate 



p = w'Au, 

v w t 
K= 2 UP ' 

q = p — Ku. 



(13) 
(14) 
(15) 
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With these definitions, we have the final expression 

PAP 1 =P(A-pu t ) 

= A - pu f up f + 2Kuu ji 

= A — qu t — uq 1 '. (16) 

Note that in the last step we have made use of the fact 
that K is real, as can be seen from Eqs. (14) and (13), 
and from the hermiticity of A. 

2.2.2. The QL algorithm for real tridiagonal matrices 

The QL algorithm is based on the fact that any real 
matrix can be decomposed into an orthogonal matrix Q 
and a lower triangular matrix L according to 

A = QL. (17) 

Equivalently, one could also start from a decomposition 
of the form A = QR, with R being upper triangular, 
to obtain the QR algorithm, which has similar proper- 
ties. For tridiagonal A, the QL decomposition is most 
efficiently calculated by a sequence of plane rotations of 
the form of Eq. (3). The iteration prescription is 

A — > Q T AQ, (18) 

but to accelerate convergence it is advantageous to use 
the method of shifting, which means decomposing A — kl 
instead of A. In each step, k is chosen in such a way 
that the convergence to zero of the uppermost non-zero 
off-diagonal element of A is maximized (see [15] for a 
discussion of the shifting strategy and for corresponding 
convergence theorems). Since in practice, the subtraction 
of k from the diagonal elements of A can introduce large 
numerical errors, the most widely used form of the QL 
algorithm is one with implicit shifting, where not all of 
these differences need to be evaluated explicitly, although 
the method is mathematically equivalent to the ordinary 
QL algorithm with shifting. 

2.3. Efficiency and accuracy of the Jacobi and QL 
algorithms 

As we have mentioned before, one of the main ben- 
efits of the QL algorithm is its efficiency: For matrices 
of large dimension n, it requires w 30n 2 floating point 
operations (combined multiply/add) if only the eigenval- 
ues are to be computed, or 6n 3 operations if also the 
complex eigenvectors are desired [8]. Additionally, the 
preceding complex Householder transformation requires 
8n 3 /3 resp. 16n 3 /3 operations. In contrast, the complex 
Jacobi algorithm takes about 3n 2 to 5n 2 complex Jacobi 
rotations, each of which involves 12n operations for the 
eigenvalues, or 24n operations for the complete eigensys- 
tem. Therefore, the total workload is s=s 35n 3 - 60n 3 resp. 
70n 3 - 120n 3 operations. 



For the small matrices considered here, these estimates 
are not reliable. In particular, the QL method suf- 
fers from the fact that the first few eigenvalues require 
more iterations than the later ones, since for these the 
corresponding off-diagonal elements have already been 
brought close to zero in the preceding steps. Further- 
more, the operations taking place before and after the 
innermost loops will play a role for small matrices. Since 
these are more complicated for QL than for Jacobi, they 
will give an additional penalty to QL. For these reasons 
we expect the performance bonus of QL over Jacobi to 
be smaller for 3x3 matrices than for larger problems. 

The numerical properties of both iterative methods are 
independent of the matrix size and have been studied in 
great detail by others, so we will only give a brief overview 
here. For real symmetric positive definite matrices, Dem- 
mel and Veselic have shown that the Jacobi method is 
more accurate than the QL algorithm [16, 17]. In partic- 
ular, if the eigenvalues are distributed over many orders 
of magnitude, QL may become inaccurate for the small 
eigenvalues. In the example given in [17], the extremely 
fast convergence of this algorithm is also its main weak- 
ness: In the attempt to bring the lowest diagonal clement 
close to the smallest eigenvalue of the matrix, a difference 
of two almost equal numbers has to be taken. 

If the requirement of positive definitcness is omitted, 
one can also find matrices for which QL is more accurate 
than Jacobi [18]. 

3. NON-ITERATIVE ALGORITHMS 

3.1. Direct analytical calculation of the eigenvalues 

For 3x3 matrices, the fastest way to calculate the 
eigenvalues is by directly solving the characteristic equa- 
tion 

P(X) = \A - AI| = (19) 

If we write 





( an 


012 


ai3\ 
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023 
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^33/ 



Eq. (19) takes the form 

P(X) = A 3 + c 2 A 2 + Cl A + c = (21) 

with the coefficients 

C2 = -an - a 22 - a 33 , (22) 
Ci = ana 2 2 + 011033 + 022033 

-|a 12 | 2 -|o 13 | 2 -|o 23 | 2 , (23) 
c = aii|a 23 | 2 + a 22 |ai 3 | 2 + a 33 |ai 2 | 2 

- ana 22 a 33 - 2 Re(a* 13 ai 2 a 23 ) . (24) 
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To solve Eq. (21), we follow the method proposed by 
del Ferro, Tartaglia, and Cardano in the 16 th cen- 
tury [19]. In Appendix A wc will discuss two alternative 
approaches and show that they lead to the same algo- 
rithm as Cardano's method if numerical considerations 
are taken into account. 

Cardano's method requires first transforming Eq. (21) 
to the form 



by defining 



p = c\- 3ci, 

q = -^fc - c\ + §c 2 ci, 



t = 2p- 3/2 q, 

7p( 



(25) 

(26) 
(27) 
(28) 
(29) 



It is easy to check that a solution to Eq. (25) is then 
given by 



1 

X = h U, 

u 



with 




(30) 



(31) 



There is a sixfold ambiguity in u (two choices for the 
sign of the square root and three solutions for the com- 
plex cube root), but it reduces to the expected threefold 
ambiguity in x. 

To achieve optimal performance, we would like to avoid 
complex arithmetic as far as possible. Therefore, we will 
now show that ypp, and thus t, are always real. We know 
from linear algebra that the characteristic polynomial 
P(X) of the hermitian matrix A must have three real 
roots, which is only pos sible if t he stationary points of 
f (A), A1/2 = -\c 2 ±\y/ c| - 3ci = -\c 2 ±\ s /p are real. 
Since c 2 is real, this implies that also ^fp must be real. 

Furthermore, from the same argument, we have the re- 
quirement that -P(Ai) > > P(X 2 ), which in turn implies 
that — 2 < t < 2. Therefore, yJt 2 /A — 1 is always purely 
imaginary, and from this it is easy to see that |tt| = 1. 
Therefore we can write u — e 1 ^ , with 



larctan^^^arctanVZZZ 
3 i/2 3 q 

J27 

= I arctan — 



4 C 1 



Cl) + C (q+ f C )] 



(32) 



The last step is necessary to improve the numerical ac- 
curacy, which would suffer dramatically if we computed 
the difference p 3 — q 2 directly. 

When evaluating Eq. (32), care must be taken to cor- 
rectly resolve the ambiguity in the arctan by taking into 



account the sign of q: For q > 0, the solution must lie in 
the first quadrant, for q < it must be located in the sec- 
ond. In contrast to this, solutions differing by multiples 
of 2tt are equivalent, so x can take three different values, 



x\ = 2cos</>, 

x 2 = 2 cos + = — cos <A — ^3 sin </>, 
x 3 = 2 cos ^(f> — '-^j = ~ cos 4> + sin <j>. 
These correspond to the three eigenvalues of A: 



(33) 



X - &x- 



\c 2 . 



(34) 



Similar formulas have been derived previously in [9]. 

The most expensive steps of Cardano's algorithm are 
the evaluations of the trigonometric functions. Neverthe- 
less, the method is extremely fast, and will therefore be 
the best choice for many practical problems. However, 
from Eq. (34) we can see that it becomes unstable for ma- 
trices with largely different eigenvalues: In general, c 2 is 
of the order of the largest eigenvalue A max . Therefore, in 
order to obtain the smaller eigenvalues, considerable can- 
cellation between ^-Xi and ic 2 must occur, which can 
yield large errors and is very susceptible to tiny relative 
errors in the calculation of c 2 . 

In general, the roots of Eq. (21) can be very sensitive 
to small errors in the coefficients which might arise due 
to cancellations in Eqs. (22) - (24). 

If e is the machine precision, we can estimate the abso- 
lute accuracy of the eigenvalues to be of 0(eA max ), which 
may imply a significant loss of relative accuracy for the 
small eigenvalues. 

Consider for example the matrix 



'10 40 1 19 1 19N 
10 19 1 20 1 9 
10 19 10 9 1 



(35) 



which has the (approximate) eigenvalues 10 40 , 10 20 , and 
1. However, Cardano's method yields 10 40 , 5 • 10 19 , and 
— 5 - 10 19 . Note that for the matrix (35), also the QL algo- 
rithm has problems and delivers one negative eigenvalue. 
Only Jacobi converges to a reasonable relative accuracy. 
See [16] for a discussion of this. 



3.2. Direct analytical calculation of the 
eigenvectors 

Once the eigenvalues Xi of A have been computed, the 
eigenvectors v, can be calculated very efficiently by using 
vector cross products, which are a unique tool in three- 
dimensional space. 

The Vj satisfy by definition 



(A - A4) • v, = 0. 



(36) 
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Taking the hermitian conjugate of this equation and mul- 
tiplying it with an arbitrary vector x G C 3 , we obtain 

vj • (A - Xil) • x = 0. (37) 

In particular, if x is the j-th unit vector ej, this becomes 

vj • (A' - Xiej) = V j, (38) 

where A- 7 denotes the j-th column of A. Consequently, as 
long as A 1 — A,ei and A 2 — Aie 2 are linearly independent, 
we have 



v 4 = [(A 1 - A;ei) x (A 2 - A,e 2 ) 



(39) 



In the special case that A 1 — A^ei = n(A 2 — Aje 2 ), v, is 
immediately given by 



If we erroneously start the vector product algorithm with 
the approximate eigenvalues 10 20 , 10 20 , and 0.98, the er- 
ror that is introduced when subtracting Ai from the diag- 
onal elements is of order O(10 9 ) and thus comparable to 
the off-diagonal elements. Consequently, the calculated 
eigenvectors 



vi 





V; = | | ■ V, = | l/v/2 

1/V2J 
(43) 

are completely wrong. 

Another flaw of the vector product algorithm is the fact 
that the subtractions (A J — Ajej) and the subtractions 
in the evaluation of the cross products are very prone to 
cancellation errors. 



W o 



(40) 



When implementing the above procedure, care must 
be taken if there is a degenerate eigenvalue because in 
this case, the algorithm will only find one of the two 
corresponding eigenvectors. Therefore, if we detect a de- 
generate eigenvalue, say Ai = A 2 , we calculate the sec- 
ond eigenvector as the cross product of vi with one of 
the columns of A — Ail. In principle, this alternative 
formula would also work for non-degenerate eigenvalues, 
but we try to avoid it as far as possible because it abets 
the propagation of errors. On the other hand, we have 
to keep in mind that the test for degeneracy may fail if 
the eigenvalues have been calculated too inaccurately. If 
this happens, the algorithm will deliver wrong results. 

The calculation of the third eigenvalue can be greatly 
accelerated by using the formula Vg = vixv 2 . Of course, 
this is again vulnerable to error propagation, but it turns 
out that this is usually tolerable. 

For many practical purposes that require only moder- 
ate accuracy, the vector product algorithm is the method 
of choice because it is considerably faster than all other 
approaches. However, the limitations to its accuracy 
need to be kept in mind. First, the eigenvectors suffer 
from errors in the eigenvalues. Under certain circum- 
stances, these errors can even be greatly enhanced by 
the algorithm. For example, the matrix 



'10 20 10 9 10 9N 
10 9 10 20 10 9 
10 9 10 9 1 



(41) 



has the approximate eigenvalues (1 + 10~ n ) • 10 20 , (1 — 
10~ n ) • 10 20 , and 0.98. The corresponding eigenvectors 
are approximately 



vi 



v 2 




v 3 
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3.3. A hybrid algorithm 

To circumvent the cases where the cross product 
method fails or becomes too inaccurate, we have devised 
a hybrid algorithm, which uses the analytical calcula- 
tions from Sees. 3.1 and 3.2 as the default branch, but 
falls back to QL if this procedure is estimated to be too 
error-prone. The condition for the fallback is 



I 2 < 2 8 eA 2 



(44) 



Here, is the analytically calculated and yet unnormal- 
ized eigenvector from Eq. (39), e is the machine precision, 
A = max(A 2 nax , A max ) is an estimate for the largest num- 
ber appearing in the problem, and 2 8 is introduced as a 
safety factor. 

Since in typical problems only a small fraction of ma- 
trices will fulfill condition (44) , the hybrid approach can 
be expected to retain the efficiency of the analytical algo- 
rithm. In reality, it is even slightly faster because fewer 
conditional branches are used. 



3.4. Cuppen's Divide and Conquer algorithm 

In recent years, the "Divide and Conquer" paradigm 
for symmetric eigenproblems has received considerable 
attention. The idea was originally invented by Cup- 
pen [20], and current implementations arc faster than 
the QL method for large matrices [6]. One can estimate, 
that for 3x3 matrices, a divide and conquer strategy 
might also be beneficial, because it means splitting the 
problem into a trivial lxl and an analytically accessible 
2x2 problem. 

However, let us first discuss Cuppen's algorithm in its 
general form for n x n matrices. As a preliminary step, 
the matrix has to be reduced to symmetric tridiagonal 
form, as discussed in Sec. 2.2.1. The resulting matrix T 
is then split up in the form 



(42) 



Ti 
T 2 



+ H, 



(45) 



G 



where Ti and T2 are again tridiagonal, and H is a very 
simple rank 1 matrix, e.g. 



/0 



H 



(3 (3 
(3 



(46) 



0/ 



Then, the smaller matrices Ti and T2 are brought to the 
diagonal form Ti = QjD^Qf , so that T becomes 



QiDiQf 
Q 2 D 2 Q 2 n 



Qi 
Q 2 



Di 
D 2 



H 



H' 



QI 
QI 



(47) 



Here, H' = z z T is another rank 1 matrix with a gener- 
ating vector z consisting of the last row of Qi and the 
first row of Q 2 , both multiplied with (3. The remaining 
problem is to find an eigenvalue A and an eigenvector v, 
satisfying 







[(0' 


V2) 



zz J -AI 



v = 0. 



(48) 



By multiplying this equation from the left with 
z T • diag((Di - AI)" 1 , (D 2 - AI)" 1 ) and dividing off the 
scalar z T v, we obtain the characteristic equation in the 
form 



1 + 



E H 
di-X 



= 0, 



(49) 



and just need to be transformed back to the original basis 
by undoing the transformations Qi, Q 2 , and the tridiag- 
onalization. 

If implemented carefully, Cuppen's method can reach 
an accuracy comparable to that of the QL method. A 
major issue is the calculation of the differences di — Xj 
in the evaluation of the characteristic equation, Eq. (49), 
and in the calculation of the eigenvectors, Eq. (50) . To 
keep the errors as small as possible when solving for the 
eigenvalue Xj , we subtract our initial estimate for A^ from 
all di before starting the iteration. This ensures that 
the thus transformed eigenvalue is very close to zero and 
therefore small compared to the di. 

As we have mentioned before, the Divide and Con- 
quer algorithm is faster than the QL method for large 
matrices. It also required 0(n 3 ) operations, but since 
the expensive steps — the reduction to tridiagonal form 
and the back-transformation of the eigenvectors — are 
both outside the iterative loops, the coefficient of n 3 is 
significantly smaller. For the small matrices that we are 
considering, however, the most expensive part is solving 
the characteristic equation. Furthermore, many condi- 
tional branches are required to implement necessary case 
differentiations, to avoid divisions by zero, and to handle 
special cases like multiple eigenvalues. Therefore, we ex- 
pect the algorithm to be about as fast as the QL method. 

It is of course possible to reduce the calculational ef- 
fort at the expense of reducing the accuracy and stability 
of the algorithm, but it will always be slower than Car- 
dano's method combined with the vector product algo- 
rithm. 



where the di are the diagonal entries of Di and D 2 . 

There are several obvious strategies for solving this 
equation in the 3x3 case: First, by multiplying out the 
denominators we obtain a third degree polynomial which 
can be solved exactly as discussed in Sec. 3.1. This is very 
fast, but we have seen that it can be numerically unstable. 
Second, one could apply the classical Newton-Raphson 
iteration, which is fast and accurate in the vicinity of the 
root, but may fail altogether if a bad starting value is 
chosen. Finally, the fact that the roots of Eq. (49) are 
known to be separated by the singularities d\ , d 2 and d 3 
suggests the usage of a bisection method to obtain the 
eigenvalues very accurately, but with slow convergence. 

To get the optimum results, we apply Cardano's an- 
alytical method to get estimates for the roots, and 
refine them with a hybrid Newton-Raphson/Bisection 
method based on [8] . This method usually takes Newton- 
Raphson steps, but if the convergence gets too slow or if 
Newton-Raphson runs out of the bracketing interval, it 
falls back to bisection. 

The elements Ujj of the eigenvectors Vj of 
diag(D 1; D 2 ) + H' arc then obtained by the simple 
formula 



va = 



dj 



X, 



(50) 



4. OTHER ALGORITHMS 

Apart from the Jacobi, QL, Cuppcn, and vector prod- 
uct algorithms there are several other methods for finding 
the eigensystems of symmetric matrices. We will briefly 
outline some of them here, and give reasons why they are 
inappropriate for 3 x 3 problems. 



4.1. Iterative root finding methods 

In order to avoid the instabilities of Cardano's method 
which were discussed in Sec. 3.1, one can use an itera- 
tive root finding method to solve the characteristic equa- 
tion. Root bracketing algorithms like classical bisection 
or the Dekker-Brent method start with an interval which 
is known to contain the root. This interval is then itera- 
tively narrowed until the root is known with the desired 
accuracy. Their speed of convergence is fair, but they 
are usually superseded by the Newton-Raphson method 
which follows the gradient of the function until it finds 
the root. However, the Newton-Raphson algorithm is not 
guaranteed to converge in all cases. 

Although these problems can partly be circumvented 
by using a hybrid method like the one discussed in 
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Sec. 3.4, iterative root finders are still unable to find 
multiple roots, and these special cases would have to be 
treated separately. Furthermore, the accuracy is limited 
by the accuracy with which the characteristic polyno- 
mial can be evaluated. As we have already mentioned 
in Sec. 3.1, this can be spoiled by cancellations in the 
calculation of the coefficients Co, ci, and c 2 . 

4.2. Inverse iteration 

Inverse iteration is a powerful tool for finding eigen- 
vectors and improving the accuracy of eigenvalues. The 
method starts with some approximation Aj for the de- 
sired eigenvalue Aj, and a random vector b. One then 
solves the linear equation 

(A - A4) • v, = b (51) 

to find and approximation Vj for the eigenvector v,. An 
improved estimate for Xi is calculated by using the for- 
mula 

(A - Ajl) • Vj w (Aj - Aj) • v;. (52) 

We estimate that inverse iteration is impractical for small 
matrices because there are many special cases that need 
to be detected and handled separately. This would slow 
the computation down significantly. 

4.3. Vectorization 

In a situation where a large number N of small ma- 
trices needs to be diagonalized, and all these matrices 
are available at the same time, it may be advantageous 
to vectorize the algorithm, i.e. to make the loop from 1 
to N the innermost loop 1 . This makes consecutive op- 
erations independent of each other (because they affect 
different matrices), and allows them to be pipelined and 
executed very efficiently. 

A detailed discussion of this approach is beyond the 
scope of the present work, but our estimate is that, as 
long as only the eigenvalues are to be computed, a vec- 
torization of Cardano's method would be most benefi- 
cial, because this method requires only few performance- 
limiting conditional branches, so that the number of pro- 
cessor pipeline stalls is reduced to a minimum. However, 
the accuracy limitations discussed above would still ap- 
ply in the vectorized version. 

If we want to calculate the full eigensystem, a vec- 
torized vector product method can only give a limited 
performance bonus because in the calculation of the vec- 
tor products, many special cases can arise which need to 



be detected and treated separately. This renders efficient 
vectorization impossible. The same is true for Cuppcn's 
Divide and Conquer algorithm. On the other hand, the 
iterative methods are problematic if the required number 
of iterations is not approximately the same for all matri- 
ces. Then, only the first few iterations can be vectorized 
efficiently Afterwards, matrices for which the algorithm 
has not converged yet need to be treated separately. 



5. BENCHMARK RESULTS 

In this section we report on the computational per- 
formance and on the numerical accuracy of the above 
algorithms. For the iterative methods we use implemen- 
tations which are similar to those discussed in [8]. Ad- 
ditionally, we study the LAPACK implementation of the 
QL/QR algorithm [6] and the QL routine from the GNU 
Scientific Library [7]. For the analytical methods we use 
our own implementations. Some ideas in our Cuppcn 
routine are based on ideas realized in LAPACK. 

Note that we do not show results for the LAPACK im- 
plementation of a Divide and Conquer algorithm (routine 
ZHEEVD) because this algorithm falls back to QL for small 
matrices (n < 25) and would therefore not yield anything 
new. We also neglect the new LAPACK routine ZHEEVR 
because for the 3x3 problems considered here it is sig- 
nificantly slower than the other algorithms. 

We have implemented our code in C and Fortran, but 
here we will only discuss results for the Fortran version. 
We have checked that the C code yields a similar numeri- 
cal accuracy, but is about 10% slower. This performance 
deficit can be ascribed to the fact that C code is harder 
to optimize for the compiler. 

Our code has been compiled with the GNU compiler, 
using the standard optimization flag -03. We did not 
use any further compiler optimizations although we are 
aware of the fact that it is possible to increase the execu- 
tion speed by about 10% if options like -f fast-math are 
used to allow optimizations that violate the IEEE 754 
standard for floating point arithmetic. 

Our numerical tests were conducted in double precision 
arithmetic (64 bit, 15 - 16 decimal digits) on an AMD 
Opteron 250 (64-bit, 2.4 GHz) system running Linux. To 
maximize the LAPACK performance on this system, we 
used the highly optimized AMD Core Math Library for 
the corresponding tests. Note that on some platforms, 
in particular on Intel and AMD desktop processors, you 
may obtain a higher numerical accuracy than is reported 
here because these processors internally use 80 bit wide 
floating point registers. 

To measure the accuracy of the calculated eigensys- 
tems we use three different benchmarks: 



We thank Charles van Loan for drawing our attention to this 
possibility. 



• The relative difference of the eigenvalue A and the 
corresponding result of the LAPACK QL routine 
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ZHEEV, A LAPACK : 



Ax 



Real symmetric matrices 



A- A 



LAPACK 



^LAPACK 



(53) 



We use the LAPACK QL implementation as a ref- 
erence because it is very well tested and stable. 

The relative difference of the eigenvector v and the 
corresponding LAPACK result, V LAPACK : 



A 2 = 



,~ r LAPACK|| 
v \V2_ 

^LAPACK || 2 ' 



(54) 



where || • || 2 denotes the Euclidean norm. This def- 
inition of A2 is of course not meaningful for matri- 
ces with degenerate eigenvalues because for these, 
different algorithms might find different bases for 
the multidimensional cigenspaces. Even for non- 
degenerate eigenvalues, A2 is only meaningful if 
the same phase convention is chosen for v and 
v LAPACK . Therefore, when computing A2, we re- 
ject matrices with degenerate eigenvalues, and for 
all others we re-phase the eigenvectors in such a way 
that the component which has the largest modulus 
in V LAPACK is real. 

• The deviation of the eigenvalue/eigenvector pair 
(A, v) from its defining property Av = Av: 



A, = 



|Av- Av|| 

ii^ii 



(55) 



Note that for the algorithms considered here, || v|| 2 = 1, 
so the definitions of A 2 and A 3 can be slightly simplified. 

We have first verified the correctness of the algorithms 
by diagonalizing 10 7 random hermitian resp. symmetric 
matrices with integer entries from the interval [—10,10] 
and (automatedly) checking that Ai, A 2 and A 3 were as 
small as expected. We have repeated this test with log- 
arithmically distributed integer entries from the interval 
[0, 10 10 ]. Such matrices tend to have largely different 
eigenvalues and are therefore a special challenge to the 
algorithms. 

For the results that we are going to present here, we 
have used similar tests, but we have allowed the matrix 
entries to be real, which corresponds more closely to what 
is found in actual scientific problems. Furthermore, for 
the logarithmically distributed case we have changed the 
interval to [10" 5 ,10 5 ]. 



5.1. Performance 

The results of our timing tests are summarized in Ta- 
ble I. The most important observation from this table 
is the fact that the standard libraries are slower by a 
substantial factor than the specialized algorithms. This 



Algorithm 


Ei 


genvalucs 


Ei 


genvectors 




Lin. 


Log. 


Lin. 


Log. 


Jacobi 


12.0 


10.0 


12.9 


10.9 


QL 


8.3 


6.3 


9.0 


6.9 


Cuppen 


6.6 


8.7 


9.3 


11.5 


GSL 


14.1 


11.6 


18.8 


15.7 


LAPACK QL 


21.5 


19.3 


49.9 


45.2 


Analytical 


2.9 


3.0 


4.0 


4.1 


Hybrid 


3.0 


3.8 


3.4 


4.5 


Complex 


hermitian matrices 




Algorithm 


Ei 


genvalues 


Ei 


genvectors 




Lin. 


Log. 


Lin. 


Log. 


Jacobi 


17.0 


15.4 


21.0 


18.4 


QL 


10.8 


8.9 


13.5 


11.2 


Cuppen 


8.3 


9.8 


12.5 


14.1 


GSL 


20.4 


17.8 


38.2 


32.5 


LAPACK QL 


29.6 


26.7 


61.9 


57.6 


Analytical 


3.5 


3.7 


5.9 


6.0 


Hybrid 


3.7 


3.6 


4.8 


5.1 



Table I: Performance of different algorithms for calculating 
the eigenvalues and eigenvectors of symmetric or hermitian 
3x3 matrices. We show the running times in seconds for 
the calculation of only the eigenvalues (left) and of the com- 
plete eigensystems (right) of 10 7 random matrices. Further- 
more, we compare the cases of linearly and logarithmically 
distributed matrix entries. For the scenarios with real ma- 
trices, specialized versions of the algorithms have been used. 
This table refers to the Fortran version of our code; the C 
version is about 10% slower. 



is due to the fact that in both LAPACK and the GSL, 
only the innermost loops, which determine the perfor- 
mance for large scale problems, are optimized. Outside 
these loops, the libraries spend a lot of time evaluat- 
ing their parameters and preparing data structures. LA- 
PACK QL additionally takes some time to decide at run- 
time whether a QR or QL algorithm is more favorable. 

On the other hand, our implementations of the itera- 
tive algorithms have a very simple parameter set, they do 
not require any internal reorganization of data structures, 
avoid function calls as far as possible, and contain some 
optimizations specific to the 3x3 case. On the downside, 
they do not perform any pre-treatment of the matrix such 
as rescaling it to avoid overflows and underflows. We be- 
lieve that for pathological matrices, LAPACK may be 
more accurate than our QL method. 

It is interesting to observe that the iterative algorithms 
are slightly faster for matrices with logarithmically dis- 
tributed entries. The reason is that for many of these ma- 
trices, the off-diagonal entries are already small initially, 
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so fewer iterations are required for the diagonalization. 
This is one of the reasons why QL is always faster than 
Cuppen in the logarithmically distributed scenario, but 
may be slower in the linear case. Another reason for 
this is the fact that logarithmically distributed matrices 
are more likely to create situations in which the hybrid 
root finder, which is used to solve Eq. (49), converges 
very slowly. This happens for example if the analytically 
calculated starting values have large errors. 

However, even in favorable situations, neither QL nor 
Cuppen can compete with the analytical methods. These 
do not require any prc-trcatmcnt of the matrix (like the 
transformation to tridiagonal form in the case of QL and 
Cuppen), and though their implementation may look a 
bit lengthy due to the many special cases that can arise, 
the number of floating point operations that are finally 
executed for each matrix is very small. 

If we compare the performance for real symmetric vs. 
complex hermitian matrices, we find that purely real 
problems can be solved much more efficiently. This is 
especially true for the Jacobi algorithm since real Ja- 
cobi transformations are much cheaper than complex 
ones. For QL and Cuppen, the performance bonus is less 
dramatic because large parts of these algorithms always 
operate on purely real data structures. The Cardano 
and vector product algorithms contain complex arith- 
metic, but their speed is also limited by the evaluation of 
trigonometric functions (Cardano) and by several condi- 
tional branches (vector product), therefore they too do 
not benefit as much as the Jacobi method. 



5.2. Numerical accuracy 

The excellent performance of the analytical and hy- 
brid algorithms is relativized by the results of our accu- 
racy tests, which are shown in Table II. While for lin- 
early distributed matrix entries, all algorithms get close 
to the machine precision of about 2 • 10~ 16 , the Car- 
dano and vector product methods become unstable for 
the logarithmically distributed case. In particular, the 
large average values of A 3 > O(10~ 3 ) show that the cal- 
culated eigenvalues and eigenvectors often fail to fulfill 
their defining property Av = Av. This problem is miti- 
gated by the hybrid technique, but even this approach is 
still far less accurate than the Jacobi, QL, and Cuppen 
algorithms. For these, A3 is still of order 1CP 9 (QL & 
Cuppen) resp. 10~ 10 (Jacobi). The fact that Jacobi is 
more accurate than QL and Cuppen confirms our expec- 
tations from Sec. 2.3. 

Note that the values of Ai and A 2 for the LAPACK 
QL algorithm are zero in the case of complex matrices 
and extremely small for real matrices. The reason is that 
LAPACK QL is the reference algorithm used in the defi- 
nition of these quantities. In the case of real matrices, Ai 
and A 2 reveal the tiny differences between the LAPACK 
ZHEEV and DSYEV routines. 

It is interesting to observe that in the logarithmically 



distributed scenario, Ai and A 2 are systematically larger 
for real than for complex matrices. This does not have 
a deeper reason but is simply due to the fact that in the 
real case, there are fewer random degrees of freedom, so 
there is a higher chance for ill-conditioned matrices to 
occur. The effect is not visible in A 3 because there it is 
compensated by the fact that this quantity receives large 
contributions mainly when in the evaluation of Avj in 
Eq. (55), a multiplication of a large matrix entry with 
a small and thus relatively inaccurate vector component 
occurs. It follows from combinatorics that this is more 
likely to happen if A and v are complex. 

6. CONCLUSIONS 

In this article, we have studied the numerical three- 
dimensional cigcnproblem for symmetric and hermitian 
matrices. We have discussed the Jacobi, QL, and Cuppen 
algorithms as well as an analytical method using Car- 
dano's formula and vector cross products. Our bench- 
marks reveal that standard packages are very slow for 
small matrices. Optimized versions of the standard al- 
gorithms are a lot faster while retaining similar numer- 
ical properties, but even their speed is not competitive 
with that of the analytical methods. We have, however, 
seen that the latter have limited numerical accuracy in 
extreme situations. Moreover, they were not designed 
to avoid overflow and underflow conditions. To partly 
circumvent these problems, we have devised a hybrid 
algorithm, which employs analytical calculations as the 
standard branch, but falls back to QL if it estimates the 
problem to be ill-conditioned. 

Depending on what kind of problem is to be solved, we 
give the following recommendations: 

• The hybrid algorithm is recommended for prob- 
lems where computational speed is more important 
than accuracy, and the matrices arc not too ill- 
conditioned in the sense that their eigenvalues do 
not differ by more than a few orders of magnitude. 
For example, in the initial example of the neutrino 
oscillation Hamiltonian, where the physical uncer- 
tainties in the parameters are much larger than the 
numerical errors, the hybrid algorithm turns out to 
be the optimal choice [4]. 

• The QL algorithm is a good general purpose 
"black box" method since it is reasonably fast and 

- except in some very special situations like the 
example given in Eq. (35) — also very accurate. If 
speed is not an issue, one can use standard im- 
plementations of QL like the LAPACK function 
ZHEEV. For better performance we recommend sim- 
pler implementations like our function ZHEEVQ3 or 
the function tqli from Ref. [8], on which our rou- 
tine is based. 

• Cuppen's Divide and Conquer method can 
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Linearly distributed real matrix entries 





Ax 


A 2 


A 3 


Algorithm 


Avg. 


Max. 


Avg. 


Max. 


Avg. 


Max. 


Jacobi 


1.34 


10" 


15 


3.52 • 10" 9 


4.01 


10" 


16 


1.32 • 10" 


12 


2.01 


10" 


15 




1.02 • 10~ 8 


QL 


1.89 


10" 


15 


5.59 • 10" 9 


4.09 


10" 


16 


2.05 ■ 10" 


12 


3.58 


10" 


15 




1.24 • 10~ 8 


Cuppen 


1.95 


10" 


15 


9.53 ■ 10" 9 


6.83 


10" 


16 


1.80 • 10" 


12 


4.21 


10" 


15 




1.45 • 10~ 8 


GSL 


1.29 


io- 


15 


3.18 ■ 10" 9 


3.56 


10" 


16 


2.18 • 10" 


12 


2.40 


10" 


15 




5.02 • 10~ 9 


LAPACK QL 


5.80 


10" 


17 


3.61 • 10~ n 


3.17 


10" 


17 


6.10 ■ 10" 


13 


2.69 


10" 


15 




8.28 ■ 10~ 9 


Analytical 


1.87 


10" 


15 


9.53 ■ 10" 9 


6.19 


10" 


15 


1.80 ■ 10 


-8 


1.36 


10" 


14 




4.32 • 10~ 8 


Hybrid 


1.87 


10" 


15 


9.53 ■ 10" 9 


4.91 


10" 


15 


6.49 ■ 10 


-!) 


1.16 


10" 


14 




2.91 • 10~ 8 



Linearly distributed complex matrix entries 





Ax 


A 2 


A 3 


Algorithm 


Avg. 


Max. 


Avg. 


Max. 


Avg. 


Max. 


Jacobi 


1.96 • 10" 


15 




7.66 ■ 10" 


9 


4.64 ■ 10" 


16 




1.13 ■ 10" 


13 


1.42 


10" 


14 




3.44 • 10" 


7 


QL 


2.08 ■ 10" 


14 




5.46 • 10" 


7 


4.83 ■ 10" 


16 




8.16 • 10" 


14 


4.27 


10" 


14 




1.14 • 10" 


6 


Cuppen 


4.37 ■ 10" 


15 




6.54 ■ 10" 


-8 


6.60 • 10" 


16 




2.03 ■ 10" 


13 


3.95 


10" 


14 




1.03 • 10" 


6 


GSL 


8.01 • 10" 


15 




1.88 ■ 10" 


-7 


4.56 ■ 10" 


16 




8.36 • 10" 


14 


2.14 


10" 


14 




5.26 ■ 10" 


7 


LAPACK QL 


0.0 






0.0 




0.0 






0.0 




2.41 


10" 


14 




6.03 • 10" 


7 


Analytical 


4.19 ■ 10" 


15 




6.54 • 10" 


8 


5.66 ■ 10" 


16 




3.17- 10" 


11 


3.05 


10" 


14 




7.95 • 10" 


7 


Hybrid 


4.19 ■ 10" 


15 




6.54 ■ 10" 


-8 


5.56 ■ 10" 


16 




3.17- 10" 


11 


3.03 


10" 


14 




7.95 • 10" 


7 



Logarithmically distributed real matrix entries 





Ax 


A 2 


A 3 


Algorithm 


Avg. 


Max. 


Avg. 


Max. 


Avg. 


Max. 


Jacobi 


2.96 • 


10" 


10 




1.94 


10" 


-4 


3.05 


10" 


12 


3.91 


10 


-7 


8.16 • 


10" 


ii 


1.10 


10 


-4 


QL 


4.88 • 


10" 


10 




4.29 


10" 


-4 


2.59 


10" 


12 


7.14 


10 


-7 


1.03 


10" 


-9 


1.18 


10 


-3 


Cuppen 


4.28 • 


10" 


10 




4.29 


10" 


-4 


3.58 


10" 


12 


6.55 


10 


-7 


8.90 • 


10" 


10 


1.12 


10 


-3 


GSL 


1.86 • 


10" 


10 




1.62 


10" 


-4 


2.78 


10" 


12 


4.01 


10 


-7 


9.87- 


10" 


10 


2.04 


10 


-3 


LAPACK QL 


8.36 • 


10" 


12 




1.14 


10" 


-5 


1.28 


10" 


-13 


1.81 


10 


-7 


1.11 


10" 


-9 


1.18 


10 


-3 


Analytical 


1.87 


• 10 


-9 




7.20 


10" 


-3 


1.80 


• 10 


-7 


1.36 


10+° 


3.47 


10" 


-1 


1.07 


10+ 6 


Hybrid 


1.40 


• 10 


-9 




1.16 


10" 


-3 


3.84 


10" 


11 


2.03 


10 


-4 


2.19 


lQ- 


-4 


4.75 


10 


fl 



Logarithmically distributed complex matrix entries 





Ax 


A 2 


A 3 


Algorithm 


Avg. 


Max. 


Avg. 


Max. 


Avg. 


Max. 


Jacobi 


1.55 ■ 10" 


10 


1.64 ■ 10" 


-4 


2.23 • 10" 


13 


7.43 • 10" 


8 


1.19 • 10" 


10 


8.24 • 10 -5 


QL 


2.25 • 10" 


10 


6.84 ■ 10" 


-4 


1.96 • 10" 


13 


1.17- 10" 


7 


7.85 • 10" 


10 


5.93 • 10~ 4 


Cuppen 


2.03 ■ 10" 


10 


6.02 ■ 10" 


-4 


2.71 • 10" 


13 


1.30 ■ 10" 


7 


7.59 • 10" 


10 


5.86 ■ 10~ 4 


GSL 


1.06 ■ 10" 


10 


8.69 • 10" 


-5 


2.17 • 10" 


13 


1.30 ■ 10" 


7 


1.38 ■ 10 


-9 


1.15 • 10~ 3 


LAPACK QL 


0.0 




0.0 




0.0 




0.0 




1.27 • 10 


-9 


1.24 ■ 10~ 3 


Analytical 


1.16 ■ 10 


-9 


7.10 ■ 10" 


-4 


6.10 ■ 10" 


-9 


8.29 • 10" 


2 


2.88 ■ 10 


-3 


2.84 ■ 10+ 3 


Hybrid 


1.11 ■ 10 


-9 


6.84 ■ 10" 


-4 


8.55 • 10" 


12 


3.55 ■ 10" 


5 


1.15 • 10 


-4 


8.81 • 10 +1 



Table II: Numerical accuracy of different algorithms for calculating the eigenvalues and eigenvectors of symmetric or hermitian 
3x3 matrices. This table refers to the Fortran implementation, but we have checked that the values obtained with the C code 
are similar. 
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achieve an accuracy similar to that of the QL algo- 
rithm and may be slightly faster for complex prob- 
lems if the input matrix is not already close to di- 
agonal. The choice between Cuppen and QL will 
therefore depend on the details of the problem that 
is to be solved. 

• If the highest possible accuracy is desired, Jacobi's 
method is the algorithm of choice. It is extremely 
accurate even for very pathological matrices, but 
it is significantly slower than the other algorithms, 
especially for complex problems. 

• The purely analytical method is not recom- 
mended for practical applications because it is su- 
perseded by the hybrid algorithm. It is, however, of 
academic interest since it reveals both the strengths 
and the limitations of analytical eigensystem calcu- 
lations. 

Let us remark that in scenarios where the diganolization 
of a hermitian 3x3 matrix is only part of a larger prob- 
lem, it may be advantageous to choose a slower but more 
accurate algorithm because this may improve the con- 
vergence of the surrounding algorithm, thus speeding up 
the overall process. The final choice of diagonalization 
algorithm will always depend strongly on the details of 
the problem which is to be solved. 
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Appendix A: ALTERNATIVE DERIVATIONS OF 
CARDANO'S METHOD 



In this appendix, we will discuss two alternative so- 
lution strategies for the third degree polynomial equa- 
tions (21), and show that in the end, numerical consid- 
erations lead again to Eq. (33). 



1. A trigonometric approach 

If we substitute x = 2 cos | on the left hand side of 
Eq. (25) and use trigonometric transformations to obtain 

2cos(9 = i, (Al) 

we can show that the solutions to Eq. (25) can be written 
as 



Our previous result —2 < t < 2 (see Sec. 3.1) ensures 
that this is well-defined. If we replace the arccos by a 
numerically more stable arctan, we immediately recover 
Eq. (33). 



2. Lagrange resolvents 

The second alternative to Cardano's derivation that 
we are going to consider employs the concept of La- 
grange resolvents. We start from the observation that 
the coefficients of Eq. (21) can be expressed in terms of 
the roots Ai, A2, and A3 of -P(A), because we can write 
PW = lli=i 2 3( A ~ ^')- I n particular, Co, ci, and c 2 are 
the so-called elementary symmetric polynomials in Ai, 
A 2 , and A 3 : 

-C2 = X] Aj ' 

ci=^A 4 Aj, ( A3 ) 

i<j 

-c =n A * 

Next, we consider the Lagrange resolvents of Eq. (21), 
which are defined by 

ri = Ai + A 2 + A 3 , 

r 2 = \ 1 +e l h\ 2 +e- t h\^ (A4) 

r 3 = Ai + e - < f ,r A2 + e i § 7r A 3 . 

We observe, that rf is invariant under permutation of 
the Ai, and so, by the fundamental theorem of symmetric 
functions [21], can be expressed in terms of c , C\, and 
c 2 . Indeed, with the definitions from Eq. (29), we obtain 



~ c 2j 



q -p. 



rl = q- \/q 2 -p 3 . 



(A5) 



We can then recover Ai, A 2 , and A 3 according to 
= l(n +r 2 + r 3 ), 

A 2 = i(ri+e- 4 K 2 + e J K 3 ), (A6) 

A3 = !(ri+ e J K 2 + e- 4 K 3 ). 

For a practical implementation of these formulas, one 
would like to avoid complex arithmetic. This is possi- 
ble because we have seen before that sj q 2 — p 3 is always 
purely imaginary. This observation allows us to write 

n = VPe>», 

<<3 = VP"''*, 



x = 2 cos | = 2 cos I i arccos | 



(± arccos ±j . (A2) 



with (j) = \ arctan \Jp 3 — q 2 / q as before, and thus 
Ai = |(-c 2 + 2pcos0), 

A 2 = \{-c 2 — pcoscf) — V3/0 sin </>) , (A8) 
A 3 = |(— c 2 — pcoscf) + V3p sin cf>) . 
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These expressions are equivalent to Eq. (33) after substi- 
tuting back Eqs. (29), so the practical implementation of 
the Lagrange method is again identical to the previous 
algorithms. 



Appendix B: DOCUMENTATION OF THE C AND 
FORTRAN CODE 

Along with the publication of this article, we provide C 
and Fortran implementations of the algorithms discussed 
here for download. They are intended to be used for fur- 
ther numerical experiments or for the solution of actual 
scientific problems. 

Our C code follows the C99 standard which provides 
the complex data type double complex. In gec, this re- 
quires the usage of the compiler option -std=c99. The 
Fortran code is essentially Fortran 77, except for the 
fact that not all variable and function names obey the 
6-character limitation. 

Both versions of the code contain detailed comments, 
describing the structure of the routines, the purpose of 
the different functions, and their arguments. The C ver- 
sion also contains detailed information about local vari- 
ables, which was omitted in the Fortran version to keep 
the code compact. 

Our nomenclature conventions for functions and sub- 
routines may seem a bit cryptical because we tried to 
keep as close as possible to the LAPACK conventions: 
The first letter indicates the data type ("D" for double 
or "Z" for double complex), the second and third letters 
indicate the matrix type ( "SY" for symmetric and "HE" 
for hermitian) , while the remaining characters specify the 
purpose of the function: "EV" means eigenvalues and/or 
eigenvectors, "J" stands for Jacobi, "Q" for QL, "D" for 
Divide & Conquer (Cuppen), "V" for vector product, and 
"C" for Cardano. We also add the suffix "3" to indicate 
that our routines are designed for 3x3 matrices. 

In the following we will describe the interface of the 
individual routines. We will discuss only those func- 
tions which are relevant to the complex case, because 
their real counterparts are similar, with the data types 
C0MPLEX*16 rcsp. double complex being replaced by 
DOUBLE PRECISION resp. double. 

Furthermore, we will only discuss the Fortran code 
here because the corresponding C functions have iden- 
tical names and arguments. For example the Fortran 
subroutine 

SUBROUTINE ZHEEVJ3(A, Q, W) 
C0MPLEX*16 A(3, 3) 
C0MPLEX*16 Q(3, 3) 
DOUBLE PRECISION W(3) 

corresponds to the C function 

int zheevj 3 (double complex A[3][3], 
double complex Q [3] [3] , double w[3]). 



1. Main driver function 

SUBROUTINE ZHEEVJ3(A, Q, W) 
C0MPLEX*16 A(3, 3) 
C0MPLEX*16 Q(3, 3) 
DOUBLE PRECISION W(3) 

This routine uses Jacobi's method (see Sec. 2.1) to find 
the eigenvalues and normalized eigenvectors of a hermi- 
tian 3x3 matrix A. The eigenvalues are stored in W, while 
the eigenvectors are returned in the columns of Q. 

The upper triangular part of A is destroyed during the 
calculation, the diagonal elements are read but not de- 
stroyed, and the lower triangular elements are not refer- 
enced at all. 

SUBROUTINE ZHEEVQ3(A, Q, W) 
C0MPLEX*16 A(3, 3) 
C0MPLEX*16 Q(3, 3) 
DOUBLE PRECISION W(3) 

This is our implementation of the QL algorithm from 
Sec. 2.2. It finds the eigenvalues and normalized eigen- 
vectors of a hermitian 3x3 matrix A and stores them in 
W and in the columns of Q. 

The function accesses only the diagonal and upper tri- 
angular parts of A. The access is read-only. 

SUBROUTINE ZHEEVD3(A, Q, W) 
C0MPLEX*16 A(3, 3) 
C0MPLEX*16 Q(3, 3) 
DOUBLE PRECISION W(3) 

This is Cuppen's Divide and Conquer algorithm, op- 
timized for 3-dimensional problems (see Sec. 3.4). The 
function assumes A to be a hermitian 3x3 matrix, and 
calculates its eigenvalues W^, as well as its normalized 
eigenvectors. The latter are returned in the columns of 
Q. 

The function accesses only the diagonal and upper tri- 
angular parts of A. The access is read-only. 

SUBROUTINE ZHEEVC3(A, W) 
COMPLEX* 16 A (3, 3) 
DOUBLE PRECISION W(3) 

This routine calculates the eigenvalues Wj of a hermi- 
tian 3x3 matrix A using Cardano's analytical algorithm 
(see Sec. 3.1). Only the diagonal and upper triangular 
parts of A are accessed, and the access is read-only. 

SUBROUTINE ZHEEVV3(A, Q, W) 
COMPLEX* 16 A (3, 3) 
COMPLEX* 16 Q(3, 3) 
DOUBLE PRECISION W(3) 

This function first calls ZHEEVC3 to find the eigenvalues 
of the hermitian 3x3 matrix A, and then uses vector 
cross products to analytically calculate the normalized 
eigenvectors (see Sec. 3.2). The eigenvalues are stored in 
W, the normalized eigenvectors in the columns of Q. 
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Only the diagonal and upper triangular parts of A need 
to contain meaningful values, but all of A may be used 
as temporary storage and might hence be destroyed. 

SUBROUTINE ZHEEVH3(A, Q, W) 
COMPLEX* 16 A (3, 3) 
COMPLEX* 16 Q(3, 3) 
DOUBLE PRECISION W(3) 

This is the hybrid algorithm from Sec. 3.3. Its default 
behavior is identical to that of ZHEEVV3, but under cer- 
tain circumstances, it falls back to calling ZHEEVQ3. As 
for the other routines, A has to be a hermitian 3x3 ma- 
trix, and the eigenvalues and eigenvectors are stored in W 
resp. in the columns of Q. 

Only the diagonal and upper triangular parts of A need 
to contain meaningful values, and access to A is read-only. 



2. Helper functions 

SUBROUTINE DSYEV2(A, B, C, RT1 , RT2 , CS, SN) 
DOUBLE PRECISION A, B, C 
DOUBLE PRECISION RT1, RT2, CS, SN 

This subroutine calculates the eigenvalues and eigen- 
vectors of a real symmetric 2x2 matrix 



The result satisfies 




(Bl) 




CS SN 
-SN CS 




CS -SN 
SN CS , 



(B2) 



and RT1 > RT2. Note that this convention is different 
from the convention used in the corresponding LAPACK 
function DLAEV2, where |RT1| > |RT2|. We use a different 
convention here because it helps to avoid several condi- 
tional branches in ZHEEVD3 and DSYEVD3. 



SUBROUTINE ZHETRD3(A, Q, D, E) 
COMPLEX* 16 A (3, 3) 
COMPLEX* 16 Q(3, 3) 
DOUBLE PRECISION D(3) 
DOUBLE PRECISION E(2) 

This routine reduces a hermitian matrix A to real tridi- 
agonal form by applying a Householder transformation Q 
according to Sec. 2.2: 




(B3) 



The function accesses only the diagonal and upper trian- 
gular parts of A. The access is read-only. 
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